Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PyPI JSON and "Simple" API instead of deprecated XML-RPC API #201

Merged
merged 4 commits into from
Jul 9, 2024

Conversation

nuclearsandwich
Copy link
Collaborator

The XML-RPC API is deprecated. Tests using it were failing due to what
seems like a malformed url. I have no idea if this API has already been
dropped or if there's some usage issue inside stdeb but I was not
motivated to investigate since the entire API is deprecated.

Instead I replaced the functionality with a new Requests HTTP session
client that uses the JSON representation of the Simple API and
the JSON API where best appropriate. It could all be done with the
JSON API but parts of it are also deprecated...

There is one regression in behavior, which is that packages with a
release on PyPI but no download url on PyPI will not be checked for an
external download url as this also deprecated, so I find this acceptable.

@nuclearsandwich nuclearsandwich requested a review from astraw July 9, 2024 00:04
The XML-RPC API is deprecated. Tests using it were failing due to what
seems like a malformed url. I have no idea if this API has already been
dropped or if there's some usage issue inside stdeb but I was not
motivated to investigate since the entire API is deprecated.

Instead I replaced the functionality with a new Requests HTTP session
client that uses the [JSON representation][1] of the [Simple API][2] and
the [JSON API][3] where best appropriate. It could all be done with the
JSON API but parts of it are also deprecated...

There is one regression in behavior, which is that packages with a
release on PyPI but no download url on PyPI will not be checked for an
external download url as this also deprecated.

[1]: https://peps.python.org/pep-0691/
[2]: https://warehouse.pypa.io/api-reference/legacy.html
[3]: https://warehouse.pypa.io/api-reference/json.html
Comment on lines -75 to -81
if download_url is None:
# PyPI doesn't have package. Is download URL provided?
result = _call(pypi.release_data, package_name, version)
if result['download_url'] != 'UNKNOWN':
download_url = result['download_url']
# no download URL provided, see if PyPI itself has download
urls = _call(pypi.release_urls, result['name'], result['version'])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the behavior for which I could find no sensible replacement in the current PyPI APIs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think the concept carries over to the new API.

if verbose >= 2:
myprint(
'found default release: %s' % (', '.join(default_releases),))
'found default release: %s' % (', '.join(default_release),))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually a long-lived typo and the intent was always to log the single default release found.

@nuclearsandwich
Copy link
Collaborator Author

It's not complete because I'm juggling a number of other changes to fix the test scripts but with this and a few other changes (mostly related to python vs python3 and environmental/dependency updates I was able to get as far as running py2dsc on psycopg 2.7 before it fails to find a pg_config executable which is much deeper into the test script than I was previously able to get.

@nuclearsandwich nuclearsandwich marked this pull request as ready for review July 9, 2024 00:13
Comment on lines 5 to 6
USER_AGENT = "pypi-install/%s ( https://github.com/astraw/stdeb )" % \
stdeb.__version__
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the only current use case passes in an explicit USER_AGENT anyway, it might be good to remove the duplicated string here and make the default behavior not specify a user agent at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to make this change but I agree with it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 0787a84 I ended up importing the USER_AGENT constant from the downloader. I think adding a download method to the pypi client would also be another way to push this down. But this cuts the duplication although it keeps the default user agent.


def release_versions(self, package_name, current_only=False):
package_name = normalize_package_name(package_name)
if current_only:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two code paths determined by current_only share only the package name normalization which is already in a separate function. I'd recommend introducing a new function for the True path here, especially since the return type seems to change based on that, which is pretty confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point! I was culting forward a pretty straight port of the interface from the old XML-RPC transport, but even that was a carry-over from an interface that was not actually being used for what it is for. I've split the functions in 2d0a089 and the result is much better. Thanks for spotting it.

if current_only:
response = self._http_session.get("%s/pypi/%s/json" % (self.pypi_url, package_name))
data = response.json()
return data['info']['version']
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that all of the other string quotes in this file are double quotes. It would be nice to be consistent.

That said, the other file you're changing here seems to use single quotes. Your call.

Suggested change
return data['info']['version']
return data["info"]["version"]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think the ruff style which I'm exploring for this project uses double quotes ubiquitously and this project currently uses a mix of quote characters.

Fixed in 2d0a089

Comment on lines +40 to +44
if url["packagetype"] == "sdist" and \
url["python_version"] == "source" and \
url["url"].endswith((".tar.gz", ".zip")):
download_url = url["url"]
md5_digest = url["digests"]["md5"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original algorithm has a break here, meaning that the first encountered match is used. This continues to loop, meaning that the last match is used. Is that an intentional deviation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! No it isn't. I don't know that there would be a situation with multiple sdist payloads for a given version so I think the break should just be a minor optimization, but it is worth preserving the behavior. In the event that there are multiple sdist urls I'm not sure that the sorting from the server is reliable so we just have to trust that the first one we get is good enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored the break in 86d8edc

Comment on lines -75 to -81
if download_url is None:
# PyPI doesn't have package. Is download URL provided?
result = _call(pypi.release_data, package_name, version)
if result['download_url'] != 'UNKNOWN':
download_url = result['download_url']
# no download URL provided, see if PyPI itself has download
urls = _call(pypi.release_urls, result['name'], result['version'])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think the concept carries over to the new API.

I was tracking quite closely the XML-RPC API interface that was used for
this information previously which repurposed a "show_hidden" parameter
to return either the current release or all releases in the response.

Moving these to separate client methods is a much clearer internal
and external organization.
@nuclearsandwich nuclearsandwich requested a review from cottsay July 9, 2024 16:17
@nuclearsandwich nuclearsandwich added this to the 0.10.1 milestone Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants